Search CORE

46 research outputs found

Reinforcement Learning for the Unit Commitment Problem

Author: Dalal Gal
Mannor Shie
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 19/07/2015
Field of study

In this work we solve the day-ahead unit commitment (UC) problem, by formulating it as a Markov decision process (MDP) and finding a low-cost policy for generation scheduling. We present two reinforcement learning algorithms, and devise a third one. We compare our results to previous work that uses simulated annealing (SA), and show a 27% improvement in operation costs, with running time of 2.5 minutes (compared to 2.5 hours of existing state-of-the-art).Comment: Accepted and presented in IEEE PES PowerTech, Eindhoven 2015, paper ID 46273

arXiv.org e-Print Archive

Crossref

Chance-Constrained Outage Scheduling using a Machine Learning Proxy

Author: Dalal Gal
Gilboa Elad
Mannor Shie
Wehenkel Louis
Publication venue
Publication date: 01/01/2018
Field of study

Outage scheduling aims at defining, over a horizon of several months to years, when different components needing maintenance should be taken out of operation. Its objective is to minimize operation-cost expectation while satisfying reliability-related constraints. We propose a distributed scenario-based chance-constrained optimization formulation for this problem. To tackle tractability issues arising in large networks, we use machine learning to build a proxy for predicting outcomes of power system operation processes in this context. On the IEEE-RTS79 and IEEE-RTS96 networks, our solution obtains cheaper and more reliable plans than other candidates

arXiv.org e-Print Archive

Open Repository and Bibliography - Liège

Beyond the One Step Greedy Approach in Reinforcement Learning

Author: Dalal Gal
Efroni Yonathan
Mannor Shie
Scherrer Bruno
Publication venue
Publication date: 10/07/2018
Field of study

The famous Policy Iteration algorithm alternates between policy improvement and policy evaluation. Implementations of this algorithm with several variants of the latter evaluation stage, e.g,

n

-step and trace-based returns, have been analyzed in previous works. However, the case of multiple-step lookahead policy improvement, despite the recent increase in empirical evidence of its strength, has to our knowledge not been carefully analyzed yet. In this work, we introduce the first such analysis. Namely, we formulate variants of multiple-step policy improvement, derive new algorithms using these definitions and prove their convergence. Moreover, we show that recent prominent Reinforcement Learning algorithms are, in fact, instances of our framework. We thus shed light on their empirical success and give a recipe for deriving new algorithms for future study.Comment: ICML 201

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Hal-Diderot

Multiple-Step Greedy Policies in Online and Approximate Reinforcement Learning

Author: Dalal Gal
Efroni Yonathan
Mannor Shie
Scherrer Bruno
Publication venue
Publication date: 20/09/2018
Field of study

Multiple-step lookahead policies have demonstrated high empirical competence in Reinforcement Learning, via the use of Monte Carlo Tree Search or Model Predictive Control. In a recent work \cite{efroni2018beyond}, multiple-step greedy policies and their use in vanilla Policy Iteration algorithms were proposed and analyzed. In this work, we study multiple-step greedy algorithms in more practical setups. We begin by highlighting a counter-intuitive difficulty, arising with soft-policy updates: even in the absence of approximations, and contrary to the 1-step-greedy case, monotonic policy improvement is not guaranteed unless the update stepsize is sufficiently large. Taking particular care about this difficulty, we formulate and analyze online and approximate algorithms that use such a multi-step greedy operator.Comment: NIPS 201

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Hal-Diderot

Implementing Reinforcement Learning Datacenter Congestion Control in NVIDIA NICs

Author: Chechik Gal
Dalal Gal
Fuhrer Benjamin
Mannor Shie
Shpigelman Yuval
Tessler Chen
Zahavi Eitan
Publication venue
Publication date: 30/04/2023
Field of study

As communication protocols evolve, datacenter network utilization increases. As a result, congestion is more frequent, causing higher latency and packet loss. Combined with the increasing complexity of workloads, manual design of congestion control (CC) algorithms becomes extremely difficult. This calls for the development of AI approaches to replace the human effort. Unfortunately, it is currently not possible to deploy AI models on network devices due to their limited computational capabilities. Here, we offer a solution to this problem by building a computationally-light solution based on a recent reinforcement learning CC algorithm [arXiv:2207.02295]. We reduce the inference time of RL-CC by x500 by distilling its complex neural network into decision trees. This transformation enables real-time inference within the

\mu

-sec decision-time requirement, with a negligible effect on quality. We deploy the transformed policy on NVIDIA NICs in a live cluster. Compared to popular CC algorithms used in production, RL-CC is the only method that performs well on all benchmarks tested over a large range of number of flows. It balances multiple metrics simultaneously: bandwidth, latency, and packet drops. These results suggest that data-driven methods for CC are feasible, challenging the prior belief that handcrafted heuristics are necessary to achieve optimal performance

arXiv.org e-Print Archive

A Tale of Two-Timescale Reinforcement Learning with the Tightest Finite-Time Bound

Author: Dalal Gal
Szorenyi Balazs
Thoppe Gugan
Publication venue
Publication date: 04/12/2019
Field of study

Policy evaluation in reinforcement learning is often conducted using two-timescale stochastic approximation, which results in various gradient temporal difference methods such as GTD(0), GTD2, and TDC. Here, we provide convergence rate bounds for this suite of algorithms. Algorithms such as these have two iterates,

\theta_n

and

w_n,

which are updated using two distinct stepsize sequences,

\alpha_n

and

\beta_n,

respectively. Assuming

\alpha_n = n^{-\alpha}

and

\beta_n = n^{-\beta}

with

1 > \alpha > \beta > 0,

we show that, with high probability, the two iterates converge to their respective solutions

\theta^*

and

w^*

at rates given by

\|\theta_n - \theta^*\| = \tilde{O}( n^{-\alpha/2})

and

\|w_n - w^*\| = \tilde{O}(n^{-\beta/2});

here,

\tilde{O}

hides logarithmic terms. Via comparable lower bounds, we show that these bounds are, in fact, tight. To the best of our knowledge, ours is the first finite-time analysis which achieves these rates. While it was known that the two timescale components decouple asymptotically, our results depict this phenomenon more explicitly by showing that it in fact happens from some finite time onwards. Lastly, compared to existing works, our result applies to a broader family of stepsizes, including non-square summable ones

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications